Feature Engineering for UFC Predictions
Introduction
Mixed martial arts has become the most electric and fastest growing sport of the last decade. It is known for The UFC is the best Mixed Martial arts organization in the world, and has been for quite some time. Since that it would be wisest to breakdown their fights to build a model. They maintin athe most complte statistics when ti comes to fighter stats and in cage fight stats. The dataset I will be using comes from the Kaggle dataset https://www.kaggle.com/datasets/mdabbert/ultimate-ufc-dataset, which has been updated as far as 2024-11-09. After conducting a preliminary analysis, we observe that the dataset contains nearly 6,500 observations and 118 variables. The number of observations is not particularly challenging; however, the 118 variables must be filtered if we are to build any predictive models.
Upon inspection, we find many columns that are not directly relevant to a fighter’s performance statistics, such as venue location, rankings, and betting odds. Therefore, we will clean the data before proceeding.
Explanatory Data Analysis
First lets take a look at what the data set has to offer. First of all i want to answer the age old questions, does age, reach, and height diffrence make a difference.
Age Difference Win Percentage
We can see from the bar plot that the mean age of Winners is 29.9 years old, while for losers it is 30.8 years old. This conforms with common knowledge, as typically MMA is known to be a “young man’s sport”
Reach Difference Win Percentage
When it comes to reach, the old sentiment that those with a longer reach is suprisigly not true. While it is true that the Winners has a very slightly longer reach of 182.6 versus the Losers mean of 182.2, the diffrence of 0.4 centimeters is non-significant and can be labeled as negligible.
Height Difference Win Percentage
The same thing can be said about height, in which the winner is 178 versus 177.7 of the loser.
Fighter Style Clustering
The second major step in feature engineering involves fighter style clustering. A common saying in Mixed Martial Arts (MMA) is that “styles make fights.” I aim to test the validity of this notion by analyzing fighter statistics and assigning each fighter to a cluster that represents their fighting style.
# Extract fighter stats
red_stats <- ufc %>%
select(Fighter = RedFighter, AvgSigStrLanded = RedAvgSigStrLanded, AvgSigStrPct = RedAvgSigStrPct,
AvgSubAtt = RedAvgSubAtt, AvgTDLanded = RedAvgTDLanded, AvgTDPct = RedAvgTDPct,
HeightCms = RedHeightCms, ReachCms = RedReachCms, WeightLbs = RedWeightLbs,
WinsByDecision = RedWinsByDec, WinsByKO = RedWinsByKO, WinsBySubmission = RedWinsBySubmission)
blue_stats <- ufc %>%
select(Fighter = BlueFighter, AvgSigStrLanded = BlueAvgSigStrLanded, AvgSigStrPct = BlueAvgSigStrPct,
AvgSubAtt = BlueAvgSubAtt, AvgTDLanded = BlueAvgTDLanded, AvgTDPct = BlueAvgTDPct,
HeightCms = BlueHeightCms, ReachCms = BlueReachCms, WeightLbs = BlueWeightLbs,
WinsByDecision = BlueWinsByDec, WinsByKO = BlueWinsByKO, WinsBySubmission = BlueWinsBySubmission)
# Combine and average stats per fighter
fighter_stats <- bind_rows(red_stats, blue_stats) %>%
group_by(Fighter) %>%
summarise(across(everything(), ~mean(.x, na.rm = TRUE))) %>%
ungroup()
# Add engineered features and compute percentages
fighter_stats <- fighter_stats %>%
mutate(
BMI = WeightLbs / (HeightCms / 100)^2,
Ape_Ratio = ReachCms / HeightCms,
TotalWins = WinsByDecision + WinsByKO + WinsBySubmission,
PercWinsByDecision = WinsByDecision / TotalWins,
PercWinsByKO = WinsByKO / TotalWins,
PercWinsBySubmission = WinsBySubmission / TotalWins
) %>%
drop_na(PercWinsByDecision, PercWinsByKO, PercWinsBySubmission)
# Select features for clustering
style_features <- fighter_stats %>%
select(AvgSigStrLanded, AvgSigStrPct, AvgSubAtt, AvgTDLanded, AvgTDPct, BMI, Ape_Ratio,
PercWinsByDecision, PercWinsByKO, PercWinsBySubmission)
# Scale and apply K-means
scaled_data <- scale(style_features)
kmeans_result <- kmeans(scaled_data, centers = 3)
# Assign style labels
fighter_stats$StyleCluster <- as.integer(kmeans_result$cluster)
# Merge back to UFC match data
ufc <- ufc %>%
left_join(fighter_stats %>% select(Fighter, StyleCluster), by = c("RedFighter" = "Fighter")) %>%
rename(RedStyle = StyleCluster) %>%
left_join(fighter_stats %>% select(Fighter, StyleCluster), by = c("BlueFighter" = "Fighter")) %>%
rename(BlueStyle = StyleCluster)
ufc$RedStyle <- as.character(ufc$RedStyle)
ufc$BlueStyle <- as.character(ufc$BlueStyle)
style_summary <- fighter_stats %>%
group_by(StyleCluster) %>%
summarise(across(c(AvgSigStrLanded, AvgSigStrPct, AvgSubAtt, AvgTDLanded, AvgTDPct, BMI, Ape_Ratio,
PercWinsByDecision, PercWinsByKO, PercWinsBySubmission), \(x) mean(x, na.rm = TRUE))) %>%
column_to_rownames("StyleCluster")
style_scaled <- scale(style_summary)
style_long <- reshape2::melt(style_scaled)
colnames(style_long) <- c("Style", "Stat", "ZScore")
# Assuming 'Style' column has values 1, 2, 3
styles <- c("Grappler", "Point-Fighter", "Striker")
style_long$Style <- factor(style_long$Style,
levels = c(1, 2, 3),
labels = styles)After performing K-Means clustering with the goal of identifying three distinct groups, we generated a heatmap of the results. The most immediately apparent features are the Z-scores for Wins by Submission, Wins by Decision, and Wins by KO. These metrics provide a strong indication of what each cluster represents. Based on this, I labeled the clusters as Grappler, Point-Fighter, and Striker, respectively.
Examining additional statistics, we observe that Point-Fighters, on average, have a higher value for AvgSigStrLanded (average significant strikes landed per fight), which is consistent with their tendency to win by decision. Since their fights typically last longer, they have more opportunities to accumulate strikes. In contrast, Strikers display a high AvgSigStrPct (average significant strike percentage), indicating greater efficiency and accuracy in their striking.
Lastly, the metrics AvgTDLanded and AvgTDPct (average takedowns landed and takedown accuracy, respectively) vary across styles. Grapplers score high on both metrics, Point-Fighters fall somewhere in the middle, and Strikers tend to have low values. Altogether, this evidence supports the validity of my cluster labeling.
Fight Style Visualization
To visualize this clustering, we will plot each fighter on a 3D Plotly scatter plot, using the first three principal components (PCs) from a PCA to represent each dimension. From the interactive plot, we can observe that the clustering appears to have worked as intended, with each fighter group distinctly positioned along the principal component axes.
# Load required libraries
library(plotly)
pca <- prcomp(scaled_data, center = TRUE, scale. = TRUE)
pca_data <- as.data.frame(pca$x[, 1:3]) # keep first 3 components
# Add fighter names and cluster labels
pca_data$Fighter <- fighter_stats$Fighter
pca_data$StyleCluster <- factor(fighter_stats$StyleCluster,
levels = c(1, 2, 3),
labels = c("Grappler", "Point-Fighter", "Striker"))
# Plot with hover text
plot_ly(data = pca_data,
x = ~PC1, y = ~PC2, z = ~PC3,
type = 'scatter3d',
mode = 'markers',
color = ~StyleCluster,
colors = c("blue", "green", "red"),
text = ~paste("Fighter:", Fighter,
"<br>Style:", StyleCluster),
hoverinfo = 'text') %>%
plotly::layout(title = "Fighter Style Clustering (3D PCA)",
scene = list(
xaxis = list(title = 'PC1'),
yaxis = list(title = 'PC2'),
zaxis = list(title = 'PC3')
))
# Extract loadings for PC1, PC2, and PC3
loadings <- pca$rotation[, 1:3]
# Convert to long format for easy plotting
loading_df <- as_tibble(loadings, rownames = "Feature")In terms of interpretation based on the loadings:
PC1 Plot
PC1 ranges from Grappling-oriented fighters on the negative end to those with greater Knockout Power on the positive end.
PC2 Plot
PC2 spans from Point-Fighters on the negative end to more aggressive Finishers (via KO or submission) on the positive end.
PC3 Plot
PC3 moves from Submission-Oriented fighters on the negative end to more Efficient All-Rounders on the positive end, characterized by higher accuracy and well-rounded performance metrics.
Elo Engine
The second aspect of feature engineering involves creating an ELO engine to introduce an additional predictor with the goal of improving model performance. This is based on the idea that momentum plays a significant role in MMA, as it does in many sports. The ELO concept, originally developed for chess, is now widely used across various games and sports as an effective way to quantify player skill.
In essence, a fighter’s ELO rating increases with a win and decreases with a loss. The change is greater if a fighter defeats an opponent with a higher ELO, and smaller if the opponent has a lower ELO. The same logic applies to losses: losing to a lower-rated opponent causes a larger ELO drop than losing to a higher-rated one. All debuting fighters are assigned an initial ELO of 1000, which is then updated after each fight.
Main Function
expected_score <- function(r1, r2) {
1 / (1 + 10 ^ ((r2 - r1) / 400))
}
update_elo <- function(winner_elo, loser_elo, k = 40) {
expected <- expected_score(winner_elo, loser_elo)
delta <- k * (1 - expected)
list(winner_elo + delta, loser_elo - delta)
}
for (i in seq_len(nrow(ufc))) {
red <- ufc$RedFighter[i]
blue <- ufc$BlueFighter[i]
winner <- ufc$Winner[i]
red_elo <- elo_ratings[red]
blue_elo <- elo_ratings[blue]
ufc$RedEloStart[i] <- red_elo
ufc$BlueEloStart[i] <- blue_elo
if (winner == "Red") {
updated <- update_elo(red_elo, blue_elo)
elo_ratings[red] <- updated[[1]]
elo_ratings[blue] <- updated[[2]]
} else if (winner == "Blue") {
updated <- update_elo(blue_elo, red_elo)
elo_ratings[blue] <- updated[[1]]
elo_ratings[red] <- updated[[2]]
}
ufc$RedEloEnd[i] <- elo_ratings[red]
ufc$BlueEloEnd[i] <- elo_ratings[blue]
}As we can see after printing out the tail, the fighter’s ELO engine works, as the fighter who loses ends up losing ELO after the bout.
After creating the ELO engine, we still need to edit some of the variables so that there are not so many when building the model. To do this, each of the individual Blue and Red variables, such as BlueWins and RedWins, will be subtracted to get a new variable WinDif. This will be done for each variable, such that the Blue variable always subtracts the Red variable to get the difference.
K Nearest Neighbors
# Recipe for KNN
knn_recipe <- recipe(Winner ~ ., data = ufc_train) %>%
step_impute_median(all_numeric_predictors()) %>%
step_impute_mode(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors())
# KNN model with tunable neighbors
knn_spec <- nearest_neighbor(neighbors = tune()) %>%
set_engine("kknn") %>%
set_mode("classification")
# Workflow
knn_wflow <- workflow() %>%
add_recipe(knn_recipe) %>%
add_model(knn_spec)
# Grid for neighbors
knn_grid <- grid_regular(
neighbors(range = c(1, 25)),
levels = 10
)
# Tune KNN
tune_res_knn <- tune_grid(
knn_wflow,
resamples = ufc_fold,
grid = knn_grid
)
# Select best model
best_knn <- select_by_one_std_err(
tune_res_knn,
metric = "roc_auc",
neighbors
)
# Finalize and fit model
knn_final <- finalize_workflow(knn_wflow, best_knn)
knn_final <- fit(knn_final, data = ufc_train)
# Predict on test set
ufc_test_preds_knn <- predict(knn_final, new_data = ufc_test, type = "prob") %>%
bind_cols(predict(knn_final, new_data = ufc_test, type = "class")) %>%
bind_cols(ufc_test %>% select(Winner))
# Accuracy and ROC AUC
knn_metrics <- metric_set(accuracy, roc_auc)(
ufc_test_preds_knn,
truth = Winner,
estimate = .pred_class,
.pred_Blue
)
knn_metrics
## # A tibble: 2 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy binary 0.551
## 2 roc_auc binary 0.570
# Confusion matrix
conf_mat(ufc_test_preds_knn, truth = Winner, estimate = .pred_class)
## Truth
## Prediction Blue Red
## Blue 407 336
## Red 287 358Running our first linear model, KNN, we can see that it does not perform well, reaching only 55% accuracy. This is disappointing, albeit predictable, since KNN is a very simple algorithm and lacks the capacity to handle the multidimensionality of this complex UFC dataset.
Regularized Logistic Regression
# Recipe
ufc_recipe <- recipe(Winner ~ ., data = ufc_train) %>%
step_impute_median(all_numeric_predictors()) %>%
step_impute_mode(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors()) %>%
step_upsample(Winner, over_ratio = 1)
# Elastic Net Model Spec
en_spec_ufc <- logistic_reg(
mixture = tune(),
penalty = tune()
) %>%
set_mode("classification") %>%
set_engine("glmnet")
# Workflow
en_workflow_ufc <- workflow() %>%
add_recipe(ufc_recipe) %>%
add_model(en_spec_ufc)
# Grid for tuning
en_grid <- grid_regular(
penalty(range = c(0, 1), trans = identity_trans()),
mixture(range = c(0, 1)),
levels = 10
)
# Hyperparameter tuning
tune_res_ufc <- tune_grid(
en_workflow_ufc,
resamples = ufc_fold,
grid = en_grid
)
# Select best model (1-SE rule)
best_en_ufc <- select_by_one_std_err(
tune_res_ufc,
metric = "roc_auc",
penalty,
mixture
)
# Finalize and fit
en_final_ufc <- finalize_workflow(en_workflow_ufc, best_en_ufc)
en_final_ufc <- fit(en_final_ufc, data = ufc_train)
# Predict on test set (probabilities and class)
ufc_test_preds <- predict(en_final_ufc, new_data = ufc_test, type = "prob") %>%
bind_cols(predict(en_final_ufc, new_data = ufc_test, type = "class")) %>%
bind_cols(ufc_test %>% select(Winner))
# Evaluate: Accuracy and ROC AUC
log_reg_metrics <- metric_set(accuracy, roc_auc)(
ufc_test_preds,
truth = Winner,
estimate = .pred_class,
.pred_Blue
)
log_reg_metrics
## # A tibble: 2 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy binary 0.616
## 2 roc_auc binary 0.657
# Confusion matrix
conf_mat(ufc_test_preds, truth = Winner, estimate = .pred_class)
## Truth
## Prediction Blue Red
## Blue 410 249
## Red 284 445
# Extract coefficient estimates
log_reg_importance <- en_final_ufc %>%
extract_fit_parsnip() %>%
tidy() %>%
filter(term != "(Intercept)") %>%
mutate(abs_estimate = abs(estimate)) %>%
arrange(desc(abs_estimate))
# View top 20 most important features
log_reg_importance %>%
top_n(20, abs_estimate) %>%
ggplot(aes(x = reorder(term, abs_estimate), y = abs_estimate)) +
geom_col() +
coord_flip() +
labs(x = "", y = "Absolute Coefficient") +
theme_minimal()Our second linear model, Regularized Logistic Regression, performs significantly better than KNN due to its optimization and regularization capabilities. Through regularization, the model adds penalty terms to the loss function, which shrink the coefficients of features. This process allows the model to assign different weights to each feature, prioritizing the most predictive ones while reducing the influence of less useful or noisy features, sometimes shrinking them to zero (especially with L1 regularization). In doing so, we can calculate the coefficient of each feature. This gives us a preliminary look at how our added features, such as a fighter’s style and ELO difference, affect model accuracy. From the plot, we can see that, by a large margin, AgeDif plays the biggest role. EloDif ranks fourth, with an absolute coefficient of approximately 0.097. Our StyleMatchup feature from the fighter style cluster is in seventh place with an absolute coefficient of approximately 0.087. This is promising, as we can see that, while not revolutionary, the added features do play a significant role as predictors.
Random Forest
# Recipe with preprocessing
rec_ufc <- recipe(Winner ~ ., data = ufc_train) %>%
step_impute_median(all_numeric_predictors()) %>%
step_impute_mode(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors())
# Random Forest model spec with permutation importance
rf_class_spec <- rand_forest(
mtry = tune(),
trees = tune(),
min_n = tune()
) %>%
set_engine("ranger", importance = "permutation") %>%
set_mode("classification")
# Workflow
rf_class_wf <- workflow() %>%
add_model(rf_class_spec) %>%
add_recipe(rec_ufc)
# Tuning grid
rf_grid <- grid_regular(
mtry(range = c(1, 6)),
trees(range = c(200, 600)),
min_n(range = c(10, 20)),
levels = 5
)
# Try to load or tune
rf_tune_class <- tryCatch({
load("elo_cluster_rf.rda")
rf_tune_class
}, error = function(e) {
rf_tune_class <- tune_grid(
rf_class_wf,
resamples = ufc_fold,
grid = rf_grid
)
save(rf_tune_class, file = "elo_cluster_rf.rda")
rf_tune_class
})
# Select best model
best_rf_class <- select_best(rf_tune_class, metric = "roc_auc")
# Finalize and fit
final_rf_model <- finalize_workflow(rf_class_wf, best_rf_class)
final_rf_model <- fit(final_rf_model, data = ufc_train)
# Predict on test set
rf_preds <- predict(final_rf_model, new_data = ufc_test, type = "prob") %>%
bind_cols(predict(final_rf_model, new_data = ufc_test, type = "class")) %>%
bind_cols(ufc_test %>% select(Winner))
# Accuracy and ROC AUC
rf_metrics <- metric_set(accuracy, roc_auc)(
rf_preds,
truth = Winner,
estimate = .pred_class,
.pred_Blue
)
rf_metrics
## # A tibble: 2 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy binary 0.607
## 2 roc_auc binary 0.649
# Confusion matrix
conf_mat(rf_preds, truth = Winner, estimate = .pred_class)
## Truth
## Prediction Blue Red
## Blue 401 252
## Red 293 442After training our first non-linear model, a Random Forest, the results were underwhelming with an accuracy of just 60.5%. Given the model’s capacity to capture non-linear relationships and handle high-dimensional data, a stronger performance was expected. To better understand feature influence, we calculated permutation-based feature importance, which measures how much a model’s accuracy drops when each feature is randomly shuffled. Interestingly, AgeDif emerged as the most influential variable. Meanwhile, EloDif, which previously ranked higher, dropped to 6th place, with an importance score of 0.00222, meaning permuting it reduced accuracy by about 0.22 percentage points. Its proportional importance was 0.07, indicating it accounted for 7% of total model reliance. Similarly, the StyleMatchup feature had an importance score of 0.00188 and a proportional contribution of 6%, suggesting a modest but meaningful impact on predictions.
XGBoost
# Recipe (reuse or define if needed)
xgb_recipe <- recipe(Winner ~ ., data = ufc_train) %>%
step_impute_median(all_numeric_predictors()) %>%
step_impute_mode(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors())
# XGBoost model spec with tuning
bt_class_spec <- boost_tree(
mtry = tune(),
trees = tune(),
learn_rate = tune()
) %>%
set_engine("xgboost") %>%
set_mode("classification")
# Workflow
bt_class_wf <- workflow() %>%
add_model(bt_class_spec) %>%
add_recipe(xgb_recipe)
# Tuning grid
bt_grid <- grid_regular(
mtry(range = c(1, 6)),
trees(range = c(200, 600)),
learn_rate(range = c(-10, -1)), # log10 scale
levels = 5
)
# Load or tune model
tune_bt_class <- tryCatch({
load("elo_cluster_xgb.rda")
tune_bt_class
}, error = function(e) {
tune_bt_class <- tune_grid(
bt_class_wf,
resamples = ufc_fold,
grid = bt_grid
)
save(tune_bt_class, file = "elo_cluster_xgb.rda")
tune_bt_class
})
# Select best model
best_bt_class <- select_best(tune_bt_class, metric = "roc_auc")
# Finalize and fit model
final_bt_model <- finalize_workflow(bt_class_wf, best_bt_class)
final_bt_model <- fit(final_bt_model, data = ufc_train)
# Predict on test set
bt_preds <- predict(final_bt_model, new_data = ufc_test, type = "prob") %>%
bind_cols(predict(final_bt_model, new_data = ufc_test, type = "class")) %>%
bind_cols(ufc_test %>% select(Winner))
# Accuracy and ROC AUC
xgb_metrics <- metric_set(accuracy, roc_auc)(
bt_preds,
truth = Winner,
estimate = .pred_class,
.pred_Blue
)
xgb_metrics
## # A tibble: 2 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy binary 0.616
## 2 roc_auc binary 0.653
# Confusion matrix
conf_mat(bt_preds, truth = Winner, estimate = .pred_class)
## Truth
## Prediction Blue Red
## Blue 427 266
## Red 267 428When evaluating the XGBoost model, we observed only a slight improvement in accuracy, reaching approximately 61.8%. Given XGBoost’s optimized boosting framework and ability to handle complex interactions, I expected features like EloDif and StyleMatchup to play a less dominant role. As anticipated, AgeDif remained the most important feature by a significant margin. EloDif ranked 6th, with an importance score of 0.054, while StyleMatchup ranked 13th, with a score of 0.022. These results suggest that although both features contribute to the model, their influence is relatively limited compared to the top predictors.
Support Vector Machine
# Recipe: normalization only (on selected predictors)
svm_rec <- recipe(Winner ~ ., data = ufc_train) %>%
step_impute_median(all_numeric_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors())
# Model spec with probability output
svm_linear_spec <- svm_linear(cost = tune()) %>%
set_mode("classification") %>%
set_engine("kernlab", prob.model = TRUE)
# Workflow
svm_linear_wkflow <- workflow() %>%
add_recipe(svm_rec) %>%
add_model(svm_linear_spec)
svm_linear_grid <- grid_regular(cost(range = c(-5, 5)), levels = 5)
# Load or tune SVM Linear model
svm_linear_res <- tryCatch({
load("elo_cluster_svm.rda")
svm_linear_res
}, error = function(e) {
svm_linear_res <- tune_grid(
svm_linear_wkflow,
resamples = ufc_fold,
grid = svm_linear_grid
)
save(svm_linear_res, file = "elo_cluster_svm.rda")
svm_linear_res
})
# Select best model
svm_best_linear <- select_best(svm_linear_res, metric = "roc_auc")
# Finalize and fit
svm_final_linear_fit <- finalize_workflow(svm_linear_wkflow, svm_best_linear) %>%
fit(data = ufc_train)
## Setting default kernel parameters
# Predict on test set
svm_preds <- predict(svm_final_linear_fit, new_data = ufc_test, type = "prob") %>%
bind_cols(predict(svm_final_linear_fit, new_data = ufc_test, type = "class")) %>%
bind_cols(ufc_test %>% select(Winner))
# Accuracy and ROC AUC
svm_linear_metrics <- metric_set(accuracy, roc_auc)(
svm_preds,
truth = Winner,
estimate = .pred_class,
.pred_Blue
)
svm_linear_metrics
## # A tibble: 2 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy binary 0.615
## 2 roc_auc binary 0.657
# Confusion matrix
conf_mat(svm_preds, truth = Winner, estimate = .pred_class)
## Truth
## Prediction Blue Red
## Blue 392 232
## Red 302 462Running SVM, I had difficulty running the RDF Version, so i decided on runnning the Linear version. It performs well with an accuracy of 61.5%
Model Comparison
# Extract metrics from each model's results
knn_metrics <- metric_set(accuracy, roc_auc)(ufc_test_preds_knn, truth = Winner, estimate = .pred_class, .pred_Blue)
log_reg_metrics <- metric_set(accuracy, roc_auc)(ufc_test_preds, truth = Winner, estimate = .pred_class, .pred_Blue)
rf_metrics <- metric_set(accuracy, roc_auc)(rf_preds, truth = Winner, estimate = .pred_class, .pred_Blue)
xgb_metrics <- metric_set(accuracy, roc_auc)(bt_preds, truth = Winner, estimate = .pred_class, .pred_Blue)
svm_linear_metrics <- metric_set(accuracy, roc_auc)(svm_preds, truth = Winner, estimate = .pred_class, .pred_Blue)
#svm_rbf_metrics <- metric_set(accuracy, roc_auc)(svm_rbf_preds, truth = Winner, estimate = .pred_class, .pred_Blue)
# Bind and reshape all metrics
data <- bind_rows(
knn_metrics %>% mutate(Model = "KNN"),
log_reg_metrics %>% mutate(Model = "Logistic Regression"),
rf_metrics %>% mutate(Model = "Random Forest"),
xgb_metrics %>% mutate(Model = "XGBoost"),
svm_linear_metrics %>% mutate(Model = "SVM"),
# svm_rbf_metrics %>% mutate(Model = "SVM RBF")
) %>%
select(Model, .metric, .estimate) %>%
pivot_wider(names_from = .metric, values_from = .estimate)
# Reshape for plotting
plot_data <- data %>%
pivot_longer(cols = c(accuracy, roc_auc), names_to = "Metric", values_to = "Score")
# Plot grouped bar chart
ggplot(plot_data, aes(x = Model, y = Score, fill = Metric)) +
geom_col(position = "dodge") +
theme_minimal() +
labs(title = "Model Performance Comparison", x = "Model", y = "Score") +
scale_fill_brewer(palette = "Set1")When comparing the models, it is clear that K-Nearest Neighbors (KNN) performs the worst, which is not surprising given its simplicity and lack of model complexity. Among the other four models, there is little to separate them. All achieve accuracies of around 60% and ROC AUC scores close to 65%. This indicates that while the models’ predictive accuracy is modest—only slightly better than random guessing—their discriminative ability is reasonably solid, as reflected in their ROC AUC scores. In other words, although the models struggle to consistently make correct classifications, they are relatively robust in distinguishing between classes.
Conclusion
In conclusion, through studying this dataset, I have gained a better understanding of the complexity of machine learning models, as well as explored different methods to help increase accuracy. The features EloDif and StyleMatchup contributed to improving model accuracy, even if only by a small margin. To further improve this project, there are several things I would like to pursue in the future. I was torn on implementing PCA when building the model, since doing so would inhibit my ability to deeply analyze each feature and its impact on the model. However, it could potentially lead to a more accurate and robust result. In addition, I would have liked to explore the fighter clustering more thoroughly. I currently have a surface-level understanding of how the mathematics behind clustering works, and I hope to study this subject in more depth in future projects.